48 research outputs found

    Toward automatic censorship detection in microblogs

    Full text link
    Social media is an area where users often experience censorship through a variety of means such as the restriction of search terms or active and retroactive deletion of messages. In this paper we examine the feasibility of automatically detecting censorship of microblogs. We use a network growing model to simulate discussion over a microblog follow network and compare two censorship strategies to simulate varying levels of message deletion. Using topological features extracted from the resulting graphs, a classifier is trained to detect whether or not a given communication graph has been censored. The results show that censorship detection is feasible under empirically measured levels of message deletion. The proposed framework can enable automated censorship measurement and tracking, which, when combined with aggregated citizen reports of censorship, can allow users to make informed decisions about online communication habits.Comment: 13 pages. Updated with example cascades figure and typo fixes. To appear at the International Workshop on Data Mining in Social Networks (PAKDD-SocNet) 201

    Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods

    Phantom cascades: The effect of hidden nodes on information diffusion

    Full text link
    Research on information diffusion generally assumes complete knowledge of the underlying network. However, in the presence of factors such as increasing privacy awareness, restrictions on application programming interfaces (APIs) and sampling strategies, this assumption rarely holds in the real world which in turn leads to an underestimation of the size of information cascades. In this work we study the effect of hidden network structure on information diffusion processes. We characterise information cascades through activation paths traversing visible and hidden parts of the network. We quantify diffusion estimation error while varying the amount of hidden structure in five empirical and synthetic network datasets and demonstrate the effect of topological properties on this error. Finally, we suggest practical recommendations for practitioners and propose a model to predict the cascade size with minimal information regarding the underlying network.Comment: Preprint submitted to Elsevier Computer Communication

    Topic modelling of clickthrough data in image search

    Get PDF
    In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into accoun

    Latent variable modelling of user interaction in image retrieval

    No full text
    Cette thèse étudie les modèles à variables latentes sur les interactions utilisateur avec l'objectif d'améliorer la recherche d'images. Les historiques de recherche, appelés query logs, où l'interaction entre les utilisateurs et le système de recherche est enregistrée, contiennent souvent les indications d'intention sous la forme de jugements de pertinence donnés sur les documents dans le contexte d'une recherche. Selon la nature du système de recherche et de l'interaction qu'il permet, ces jugements peuvent être explicites ou implicites, et, une fois agrégé un grand nombre des recherches effectuées par de nombreux utilisateurs, ils peuvent être exploités pour améliorer divers aspects du système de recherche. Cette thèse propose un modèle des historiques de recherche, le Modèle de Pertinence Utilisateur, où les jugements de pertinence sont issus d'un processus génératif par lequel l'utilisateur juge (soit implicitement soit explicitement) un document comme pertinent s'il partage un degré de recouvrement avec la requête en termes de concepts, et non pertinent dans le cas contraire

    Semantic clustering of images using patterns of relevance feedback

    No full text
    User-supplied data such as browsing logs, click-through data, and relevance feedback judgements are an important source of knowledge during semantic indexing of documents such as images and video. Low-level indexing and abstraction methods are limited in the manner with which semantic data can be dealt. In this paper and in the context of this semantic data, we apply latent semantic analysis on two forms of usersupplied data, real-world and artificially generated relevance feedback judgements in order to examine the validity of using artificially generated interaction data for the study of semantic image clustering

    Evolutionary Clustering and Analysis of User Behaviour in Online Forums

    No full text
    In this paper we cluster and analyse temporal user behaviour in online communities. We adapt a simple unsupervised clustering algorithm to an evolutionary setting where we cluster users into prototypical behavioural roles based on features derived from their ego-centric reply-graphs. We then analyse changes in the role membership of the users over time, the change in role composition of forums over time and examine the differences between forums in terms of role composition. We perform this analysis on 200 forums from a popular national bulletin board and 14 enterprise technical support forums

    L.C.D.: Ensemble methods for spoken emotion recognition in call-centers. Speech communication 49,

    No full text
    Abstract Machine-based emotional intelligence is a requirement for more natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have not been applied in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods
    corecore